Multi-label ASRS Dataset Classification Using Semi Supervised Subspace Clustering
نویسندگان
چکیده
There has been a lot of research targeting text classification. Many of them focus on a particular characteristic of text data multi-labelity. This arises due to the fact that a document may be associated with multiple classes at the same time. The consequence of such a characteristic is the low performance of traditional binary or multi-class classification techniques on multi-label text data. In this paper, we propose a text classification technique that considers this characteristic and provides very good performance. Our multi-label text classification approach is an extension of our previously formulated [3] multi-class text classification approach called SISC (Semi-supervised Impurity based Subspace Clustering). We call this new classification model as SISC-ML(SISC Multi-Label). Empirical evaluation on real world multi-label NASA ASRS (Aviation Safety Reporting System) data set reveals that our approach outperforms state-of-theart text classification as well as subspace clustering algorithms.
منابع مشابه
Semi-supervised Multi-label Classification - A Simultaneous Large-Margin, Subspace Learning Approach
Labeled data is often sparse in common learning scenarios, either because it is too time consuming or too expensive to obtain, while unlabeled data is almost always plentiful. This asymmetry is exacerbated in multi-label learning, where the labeling process is more complex than in the single label case. Although it is important to consider semisupervised methods for multi-label learning, as it ...
متن کاملA Novel Multi Label Learning Based on Clustering Integrated Ensemble Classifier Chain Micro Prediction Models
Most of the real world problems are concerned with assignment of multiple target labels to the instances. The proposed model aims to increase the accuracy by incorporating supervised and semi supervised learning. K Means clustering is employed which creates K clusters based on the initialization of cluster centroids. Datasets are clustered based on its distribution in the Euclidean space. Clust...
متن کاملVinayaka : a Semi-supervised Projectedclusteringmethodusing Differential Evolution
Differential Evolution (DE) is an algorithm for evolutionary optimization. Clustering problems have been solved by using DE based clustering methods but these methods may fail to find clusters hidden in subspaces of high dimensional datasets. Subspace and projected clustering methods have been proposed in literature to find subspace clusters that are present in subspaces of dataset. In this pap...
متن کاملFeature Selection based Semi-Supervised Subspace Clustering
Clustering is the process which is used to assign a set of n objects into clusters(groups). Dimensionality reduction techniques help in increasing the accuracy of clustering results by removing redundant and irrelevant dimensions. But, in most of the situations, objects can be related in different ways in different subsets of the dimensions. Dimensionality reduction tends to get rid of such rel...
متن کاملA Scalable Clustering-Based Local Multi-Label Classification Method
Multi-label classification aims to assign multiple labels to a single test instance. Recently, more and more multi-label classification applications arise as large-scale problems, where the numbers of instances, features and labels are either or all large. To tackle such problems, in this paper we develop a clustering-based local multi-label classification method, attempting to reduce the probl...
متن کامل